A Comparisoat of Two Quanttization Techniques for Speech Spectral Parameters
نویسنده
چکیده
FOR SPEECH SPECTRAL PARAMETERS John Leis Faculty of Engineering University of Southern Queensland Toowoomba, AUSTRALIA email: [email protected] Sridha Sridharan Signal Processing Research Centre Queensland University of Technology Brisbane, AUSTRALIA ABSTRACT The major contributor to the overall bit rate in low rate speech coders is the encoding of the short-term spectral parameters. Current schemes typically use 32-40 bits per coding frame to represent this information using scalar quantization techniques. Several researchers have recently investiged the use of vector quantization (VQ) for these parameters, and found that direct VQ requires a prohibitively large vector codebook. We report here on some experiments undertaken on a large number of speech frames involving the use of vector quantization for the line spectral frequencies. Speci cally, we set out to determine the in uence of the clustering algorithm used in the training process and the number of training vectors required to give good generalization and hence a robust codebook. 1. SPECTRUM CODING The well-known Linear Predictive Coding (LPC) model is normally used to remove short-term redundancies within a frame of speech. For the purposes of quantization (scalar or vector) the Line Spectrum Frequency (LSF, also referred to as Line Spectrum Pair or LSP) is used. Given the Linear Predictive Coding (LPC) model with coe cients ai, the LSF representation is found as in [3]. The spectral distortion is calculated using SD2 = 1 N N 1 Xn=0 20 log jS (n) j jŜ (n) j 2 (1) 2. RESULTS The algorithms compared were the Pairwise Nearest Neighbour (PNN) [1] and the Generalized Lloyd Algorithm (GLA) [2]. The speech was sampled at 16kHz and split into frames of 320 samples, giving 50 frames per second. A Hamming window was used with 10 LPC coe cients converted into 10 LSF's. The GLA and PNN training was implemented using the MATLAB system [7] and training of the VQ codebooks was conducted over approximately 32k speech frames of the \Train" section of the TIMIT DR-2 speech database [6]. Testing was carried out using the TIMIT DR-2 \Test" database using 10000 test vectors outside the training set. 9 bits, Av SD=3.3 dB 10 bits, Av SD=3.0 dB 11 bits, Av SD=2.8 dB 12 bits, Av SD=2.7 dB 13 bits, Av SD=2.5 dB
منابع مشابه
Classification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملCorrelation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants
Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...
متن کاملA New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملThe Function of Pitch Range Variations in Samples of Emotional Expressions in Persian
This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996